NooJ: a Linguistic Annotation System for Corpus Processing
نویسنده
چکیده
One characteristic of NooJ is that its corpus processing engine uses large-coverage linguistic lexical and syntactic resources. This allows NooJ users to perform sophisticated queries that include any of the available morphological, lexical or syntactic properties. In comparison with INTEX, NooJ uses a new technology (.NET), a new linguistic engine, and was designed with a new range of applications in mind.
منابع مشابه
Using NooJ for semantic annotation of Italian language corpora in the domain of motion: a cognitive-grounded approach
In this paper we propose a system to parse and annotate motion constructions expressed in Italian language. We used NooJ as a software tool to implement finite-state transducers in order to recognize linguistic elements constituting motion events. In this paper we describe the model we adopted for semantic description of events (grounded on Talmy’s Cognitive Semantics theories) and then we illu...
متن کاملComplex Annotations with NooJ
NooJ associates each text with a Text Annotation Structure, in which each recognized linguistic unit is represented by an annotation. Annotations store the position of the text units to be represented, their length, and linguistic information. NooJ can represent and process complex annotations, such as those that represent units inside word forms, as well as those that are discontinuous. We dem...
متن کاملAutomatic transcription of 17th century English text in Contemporary English with NooJ: Method and Evaluation
Since 2006 we have undertaken to describe the differences between 17th century English and contemporary English thanks to NLP software. Studying a corpus spanning the whole century (tales of English travellers in the Ottoman Empire in the 17th century, Mary Astell's essay A Serious Proposal to the Ladies and other literary texts) has enabled us to highlight various lexical, morphological or gra...
متن کاملSentence Classification and Clause Detection for Croatian
We present a method for classifying Croatian sentences by structure and detecting independent and dependent clauses within these sentences and provide its evaluation. A prototype system applying the method was implemented by using the NooJ linguistic development environment, both for purposes of this experiment and for further utilization in a prototype rule-based chunking and shallow parsing s...
متن کاملCorpus Annotation By Generation
As the interest in annotated corpora is spreading, there is increasing concern with using existing language technology for corpus processing. In this paper we explore the idea of using natural language generation systems for corpus annotation. Resources for generation systems often focus on areas of linguistic variability that are under-represented in analysis-directed approaches. Therefore, ma...
متن کامل